Recognizing Sloppy Speech
نویسندگان
چکیده
As speech recognition moves from labs into the real world, the sloppy speech problem emerges as a major challenge. Sloppy speech, or conversational speech, refers to the speaking style people typically use in daily conversations. The recognition error rate for sloppy speech has been found to double that of read speech in many circumstances. Previous work on sloppy speech has focused on modeling pronunciation changes, primarily by adding pronunciation variants to the dictionary. The improvement, unfortunately, has been unsatisfactory. To improve recognition performance on sloppy speech, we revisit pronunciation modeling issues and focus on implicit pronunciation modeling, where we keep the dictionary simple and model reductions through phonetic decision trees and other acoustic modeling mechanisms. Another front of this thesis is to alleviate known limitations of the current HMM framework, such as the frame independence assumption, which can be aggravated by sloppy speech. Three novel approaches have been explored: • flexible parameter tying : We show that parameter tying is an integral part of pronunciation modeling, and introduce flexible tying to better model reductions in sloppy speech. We find that enhanced tree clustering, together with single pronunciation dictionary, improves performance significantly. • Gaussian transition modeling : By modeling transitions between Gaussians in adjacent states, this alleviates the frame independence assumption and can be regarded as a pronunciation network at the Gaussian level. • thumbnail features: We try to achieve segmental modeling within the HMM framework by using these segment-level features. While they improve performance significantly in initial passes, the gain becomes marginal when combined with more sophisticated acoustic modeling techniques. We have also worked on system development on three large vocabulary tasks: Broadcast News, Switchboard and meeting transcription. By empirically improving all aspects of speech recognition, from front-ends to acoustic modeling and decoding strategies, we have achieved a 50% relative improvement on the Broadcast News task, a 38% relative improvement on the Switchboard task, and a 40% relative improvement on the meeting transcription task.
منابع مشابه
Unsupervised estimation of the language model scaling factor
This paper addresses the adjustment of the language model (LM) scaling factor of an automatic speech recognition (ASR) system for a new domain using only un-transcribed speech. The main idea is to replace the (unavailable) reference transcript with an automatic transcript generated by an independent ASR system, and adjust parameters using this sloppy reference. It is shown that despite its fair...
متن کاملRecognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کاملSloppy Identity
Although sloppy interpretation is usually accounted for by theories of ellipsis, it often arises in non-elliptical contexts. In this paper, a theory of sloppy interpretation is provided which captures this fact. The underlying idea is that sloppy interpretation results from a semantic constraint on parallel structures and the theory is shown to predict sloppy readings for deaccented and paychec...
متن کاملar X iv : c m p - lg / 9 70 50 02 v 1 1 M ay 1 99 7 Sloppy identity
Although sloppy interpretation is usually accounted for by theories of ellipsis, it often arises in non-elliptical contexts. In this paper, a theory of sloppy interpretation is provided which captures this fact. The underlying idea is that sloppy interpretation results from a semantic constraint on parallel structures and the theory is shown to predict sloppy readings for deaccented and paychec...
متن کاملRobot Arm Performing Writing through Speech Recognition Using Dynamic Time Warping Algorithm
This paper aims to develop a writing robot by recognizing the speech signal from the user. The robot arm constructed mainly for the disabled people who can’t perform writing on their own. Here, dynamic time warping (DTW) algorithm is used to recognize the speech signal from the user. The action performed by the robot arm in the environment is done by reducing the redundancy which frequently fac...
متن کامل